Chapter 1 - Information Representation
1.1 Data Representation
Number Systems →
-
Denary is a base-10 system. 10 symbols/digits - 0-9
-
Binary is a base-1 system, with 2 symbols (0 & 1). Binary digits are referred to as bits, and all data is manipulated and stored in a computer using binary code
-
Hexadecimal (hex) is a base-16 system, with 16 symbols (0-9 & A-F). Hex is used for RGB colour codes, IPv6, MAC addresses, error codes
Hex Codes →
A = 10
B = 11
C = 12
D = 13
E = 14
F = 15
Binary Prefixes →
-
1 bit = 1 0/1 binary digit
-
1 nibble = 4 bits
-
1 byte = 8 bits
-
1 kibibyte (KiB) = 1024 bytes
-
1 mebibyte (MiB) = 1024 KiB
-
1 gibibyte (GiB) = 1024 MiB
-
1 tebibyte (TiB) = 1024 GiB
Denary prefixes →
-
Kilo = 10^3
-
Mega = 10^6
-
Giga = 10^9
-
Tera = 10^12
Binary Addition & Subtraction →
Addition →
-
0 + 0 = 0
-
1 + 0 = 1
-
1 + 1 = 10
-
1 + 1 + 1 = 11
Sometimes, an error called overflow can occur - the answer cannot be represented with the current number of bits, as the number of bits in the answer exceeds a predefined range
Subtraction →
Convert the second number (the subtracting one) into two’s complement, and add the two numbers. (see later in the chapter for two’s complement)
Internal Coding of Numbers →
Byte: 8 bits treated as a single units. values are 0 to 28-1
Unsigned integer: simply a binary number; always positive
Signed integer: can be positive or negative. The left-most bit is called the most significant bit (MSB) and this is used to determine if a number is positive or negative ( 1 = negative, 0 = positive)
Sign and Magnitude →
-
MSB is used for the sign (positive or negative) and the rest is the value
-
Range: - 127 to 127
-
Decreased range through double zeroes (positive and negative zero)
-
Overflow & calculation errors occur
One’s Complement →
-
1s inverted to 0 and 0s inverted to 1
-
Range: -127 to 127
-
Positive and negative zeroes, and calculation errors occur
Two’s Complement →
-
One’s complement + 1
-
Copy zeroes from the right end and the first 1, and invert the rest
-
Increased range : -128 to 127
-
Only one zero
-
Easier calculations, with reduced errors
Binary Coded Decimal →
Uses four bits/one nibble to represent each denary digit
Pros →
-
Easier to convert from denary to BCD, so this makes it easier to encode and decode
-
Easier to understand and implement in hardware
-
Can represent large numbers or monetary values accurately
-
Bits 1010 - 1111 can be used for other characters
Cons →
-
There is no standard for bits 1010 - 1111
-
Less efficient
-
Complicates calculations
Uses →
-
Financial institutions - representing monetary values
-
Electrical calculator and LED displays
-
Date and time in BIOS of PC
-
Latitude and longitude
-
Barcodes (MSI)
-
Accurately represent decimals and fractions; any use when numbers are electronically coded
Coding of Text →
Text coding needs a character set - the composite number of different symbols computer hardware and software use and recognise. Uses codes, bit patterns, or natural numbers to represent a symbol, and each symbol has a unique number.
ASCII →
-
7 bits or 128 characters. Extended ascii uses 8 bits or 256 characters
-
Only supports characters from the English language, so its main downside is that it can't represent other languages
-
Punctuation, uppercase and lowercase have their own symbols
-
A key is pressed. Each key is assigned a binary number. the CPU uses the ASCII character set to convert the binary number to a character which is displayed on the screen
Unicode →
-
16 bits and 2^16 characters
-
First 128 characters are the same - Unicode is a superset of ASCII
-
Standardised
-
Greater range of characters and represents most modern languages, but also means that more storage is required for English
1.2 Multimedia
Coding of Images →
Vectors →
-
Vectors store a set of instructions and mathematical formulae on how to draw each object. Consists of drawing objects defined in a drawing list. Defined by maths & geometry
-
Typically used for geometric objects
-
Can group individual elements
-
Needs to be rasterised to display and print
-
Image can be enlarged without becoming pixelated, as it stores instructions to make each image, which are recreated at a larger scale, and therefore can be used on many screen resolutions
-
Smaller file size, as it stores instructions instead of pixels, so it is faster and uses less bandwidth to upload and download
-
Contains:
-
Drawing List: commands to define each object
-
Geometric shapes/lines
-
Coordinates
-
Commands, formulae and attributes for each object e.g. colour, thickness, width
Bitmaps →
-
Consists of pixels - picture elements - which are the smallest identifiable component of a bitmap, defined by position and colour. these are arranged in a matrix
-
Larger file size
-
Enlarging makes the image appear pixelated
-
Can be compressed with a significant reduction in file size
-
Suitable for photos and scanned images
-
Less processing power
-
Can’t group individual elements
-
Colour depth = number of bits for each pixel. n bits = 2^n colours
-
File size = width × height × colour depth
-
How they are encoded:
-
Each image is split into pixels, which form a grid
-
Each pixel is given a binary value and colour
-
Sequence of binary numbers stored
-
Suitable file format and metadata
-
Converts from analogue to digital with an ADC
File header →
-
A file header is a set of bytes at the beginning to identify a file, confirm no damage and tell OS what to do
-
Stores metadata about the file
-
Contains:
-
Confirmation of file type
-
File size
-
Dimensions & resolution
-
Colour depth
-
Compression
Resolutions →
Image resolution: measure in dots per inch. Detail in an image; total number of pixels in an image, a product of width and height
Screen resolution: monitor specification. number of pixels a screen can display, also a product of width and height
Coding of Sound →
Sound is analogue (continuous range, through measuring a physical property) and must be converted to digital (binary 0s and 1s) using an ADC (analogue to digital converter)
-
Sampling: recording the amplitude of the analogue sound wave ar regular intervals to approximate the wave. samples are encoded as binary values and stored in the order they appear in
-
Sampling rate: number of samples per unit time. increasing this increases file size but also accuracy. measured in hertz
-
Sampling resolution: number of distinct values to encode each sample, or bits per sample. also known as bit depth, and is typically 8,16,24,32 bits. a higher resolution increases accuracy, file size and reduces quantization error and distortion
-
Band-limiting filter: a component of the sound encoder which removes high frequency components we don’t hear
-
Quantization: process of correcting timings, background noise etc. so sound is more accurate when sampling
File size = sampling rate × resolution × time
1.3 Compression
Compression →
Compression is needed as data files are very large and would take a long time or a lot of bandwidth to send, and as emails may limit the size of attachments
Lossy →
-
Some information/data lost and file can not be exactly reconstructed
-
Some data is deemed redundant and permanently removed
-
Results in loss of quality
-
Max compression: 10% of original
-
Sound: keeps sounds the human ear can process and discards what we can’t. removes background noise and frequencies above human hearing
-
Images: reduce colour depth so there are less bits per pixel, or reduce resolution so there are less bits overall, e.g. jpeg. difference is unnoticeable to human eye
-
MP3: psycho-acoustic modelling and perceptual music shaping. certain parts eliminated without significantly degrading listener experience. removes sounds human ear can’t hear and keeps what we can. discards the softer of two sounds
Lossless →
-
Relies on some form of replacement
-
Subsequent decoding can exactly recreate the file
-
Loses none of the original data
-
Max compression: 50% of original
-
Run-Length Encoding (RLE): A lossless compression algorithm. Identifies and indexes repeating sequences (runs) and encodes as two values: what is being repeated (run value) and how many times it is repeated (run count). May be preceded by a control character